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Implementation details of the coupled 
QMR algorithm 

Roland W. Freund and Noel M. Nachtigal 


Abstract. The original quasi-minim al residual method (QMR) relies on the three- 
term look-ahead Lanczos process to generate basis vectors for the underlying Krylov 
subspaces. However, empirical observations indicate that, in finite precision arithmetic, 
three-term vector recurrences are less robust than mathematically equivalent coupled two- 
term recurrences. Therefore, we recently proposed a new implementation of the QMR 
method based on a coupled two-term look-ahead Lanczos procedure. In this paper, we 
describe implementation details of this coupled QMR algorithm, and we present results 
of numerical experiments. 


1 Introduction 

Recently, we proposed a new Krylov subspace iteration, the quasi-minimal residual 
algorithm (QMR) [5], for solving general nonsingular non-Hermitian systems of 
linear equations 

Ax = b. (1.1) 

The QMR method has two main ingredients: the look-ahead Lanczos process, and 
a quasi-minimal residual condition. The look-ahead Lanczos algorithm is used to 
generate — with low work and storage requirements — basis vectors for the under- 
lying Krylov subspaces. Furthermore, look-ahead techniques are used to avoid 
possible breakdowns in the classical Lanczos algorithm [7], except for so-called 
incurable breakdowns. Once the Lanczos basis is constructed, the quasi-minimal 
residual property is used to select the QMR iterates from the Krylov subspaces. As 
was shown in [5], the QMR iterates are always well defined, and the quasi-minimal 
residual condition leads to a smooth and nearly monotone convergence behavior. 
In addition, thanks to the quasi-minimal residual property, it is possible to prove 
convergence results for the QMR algorithm. The result is a method with several 
desirable numerical and theoretical properties. 

In the original QMR algorithm, the look-ahead Lanczos method used gener- 
ates the basis vectors for the Krylov subspaces by means of three-term recurrences. 
It has been observed that, in finite precision arithmetic, vector iterations based 
on three- term recursions are usually less robust than mathematically equivalent 
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coupled two-term vector recurrences. Therefore, in [6], we proposed a new im- 
plementation of the QMR algorithm, based on a coupled two-term recurrence 
formulation of the Lanczos algorithm. Together with the derivation, we discussed 
in [6] the properties of the new implementation, and presented numerical results 
showing that the new method is more robust than the original QMR algorithm. 
In this paper, we discuss in more detail the implementation of the new algorithm; 
in particular, we focus on the implementation of the coupled two-term version of 
the look-ahead Lanczos algorithm. We also give several new numerical examples. 

The outline of the paper is as follows. In Section 2, we recall the three-term 
look-ahead Lanczos process that was proposed in [3]. Then, in Section 3, we review 
the coupled two-term look-ahead Lanczos algorithm that was proposed in [6], and 
in Section 4 we give implementation details of the new algorithm. In Section 5, we 
briefly recall how the QMR approach can be combined with the coupled Lanczos 
algorithm to obtain a new implementation of the QMR method, and in Section 6, 
we report results of numerical experiments with this new QMR algorithm. Finally, 
in Section 7, we make some concluding remarks. 

Throughout the paper, all vectors and matrices are allowed to have real or 
complex entries. As usual, = [m^] denotes the transpose of the matrix 
M = [m jjk ]. We use <r min (M) for the smallest singular value of M, while the 
vector norm ||a:|| := V x H x is always the Euclidean norm. We denote by 
K n (c y B) := span{c, Sc, 

the nth Krylov subspace of generated by c € and the N x N matrix B . 
Finally, it is always assumed that A is an N x N matrix, singular or nonsingular. 


2 The three-term look-ahead Lanczos algorithm 

The Lanczos process is a method that builds basis vectors for two Krylov sub- 
spaces, with low work and storage requirements. Given two starting vectors, v x 
and w x E C^, the algorithm computes two sequences of vectors, {t^}" =1 and 
{wj IjLj , such that, for n = 1, 2, . . . , 

span{v 1 ,v 2 ,...iv n } = K n {v x ,A), ^ ^ 

span{u;j,i« 2 ,...,u\,} = K n (w 1 ,A T ). 

In addition, the two sets of vectors are required to obey a biorthogonality relation. 
Ideally, one would like to impose the condition 

w„Vj = wj t> a — 0 for all j < n. (2.2) 

This is done in the Lanczos process, as proposed by Lanczos in [7]. However, it 
turns out that it is not always possible or numerically stable to construct vec- 
tors satisfying (2.2), as exact breakdowns (w%v n = 0) or near-breakdowns {w%v n 
is small in some sense) may arise. This poses a problem, since the construction 
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of the pair u n+1 and w n+1 obeying (2.2) involves division by w„v n . As a rem- 
edy, one relaxes the biorthogonality condition, requiring instead that, in case of 
a breakdown, the relations (2.2) hold only for some range of j up to, but not 
equal to, n. This leads to so-called look-ahead Lanczos algorithms, which skip 
over the exact and near-breakdowns. The original QMR algorithm is based on a 
look-ahead Lanczos method proposed by Freund, Gutknecht, and Nachtigal [3], 
which we briefly review next. 

Like the classical algorithm, the look-ahead Lanczos algorithm [3] generates 
vectors and {t£^}" =1 with (2.1). In addition, they satisfy the biorthogo- 

nality relation 

WfV n = D n , (2.3) 

where D n is a block diagonal matrix whose structure is discussed below, and 

v n := [v, v 2 ■■■ v n } and W n \=[w x w 2 u>J. 

This means that some of the vectors and do in fact satisfy the 

full biorthogonality (2.2). These vectors are called regular , and they form a sub- 
sequence and {u> nj }j = i> where 

1 =: n l < n 2 < • • ■ < n l < n < n /+1 , / := l(n). (2.4) 

All vectors that are not regular are called inner. The regular vectors are used 
to partition {fj}" =1 and {u> ; *}" =1 into blocks. By convention, one defines blocks 
V^\ of size N x h- y containing the regular vector v n and all inner vectors — if 
any — -between v nj and v ni+l : 

v W) = [v ni %+i ••• ]• 

A look-ahead step is then defined in terms of building such a block. Hence the 
integer / in (2.4) is just the number of look-ahead steps taken during the first n 
steps of the Lanczos algorithm. Moreover, hj is called the length of the jth look- 
ahead step. The structure of the sequence parallels that of the sequence 

{Vj so that V n and W n can be written as 

V n = [ VW vW] and W n = [W^ ^ 2) ••• W (!) ], 

and D n in (2.3) is given by 

D n = diag(Z)( 1 ) ) D ^\ . . . , D (J) := (W^) T V^\ 

The choice of whether to build a regular or an inner vector is determined at each 
step, based on the particular look-ahead strategy used. It should also be pointed 
out that, even though the look-ahead Lanczos algorithm can handle most break- 
downs, there remains a class of breakdowns, the so-called incurable breakdowns , 
which cannot be cured by look-ahead techniques. However, incurable breakdowns 
occur only in very particular circumstances, and they do not pose a problem in 
practice. Finally, since the scaling of the Lanczos vectors is not determined by 
(2.1) and (2.3), the look-ahead algorithm scales the vectors to have unit length: 

Ikll = IKII = 1 . n = 1 , 2 , . . . . (2.5) 
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The main point of the look-ahead Lanczos process is that vectors satisfy- 
ing (2.1) and (2.3) can be constructed by means of short block three- term re- 
currences. These recurrences can be written compactly as 

AV a = V n+1 H nt (2.6) 

A T W n = W n+1 r~ l +1 H n T nl (2.7) 

where H n is an (n + 1) x n block tridiagonal matrix, 

r„:=diag(7„7 2 .---.7n). where 7, == { if ]> l] ( 2 ' 8 > 

is a diagonal scaling matrix with positive diagonal entries, and Pj and ^ are scale 
factors used to ensure that and Wj , respectively, obey the scaling (2.5). Since the 
recurrences used to build f n ^ 1 and are short, the look-ahead algorithm has 

low work and storage requirements, making it an attractive method for building 
bases for Krylov subspaces. However, it has been observed that, in finite preci- 
sion arithmetic, three-term vector recurrences are less robust than mathematically 
equivalent coupled two-term recurrences. This was our motivation in proposing 
in [6] a different implementation of the look-ahead Lanczos algorithm, based on 
coupled two-term recurrences. Next, we review this algorithm. 


3 The coupled two-term 
look-ahead Lanczos process 

The coupled two-term Lanczos process is an alternate way of generating the Lanc- 
zos basis vectors! The algorithm generates, in addition to the Lanczos vectors 
and {u> ; }? =1 , a second set of basis vectors, {Pj}J=\ and i, such 

that, for 7i ~ 1,2,..., 

span{pj,p 2 ,...,p„} =K n (v 1 ,A), 
span{g 1 ,g 2l .. .,g n } = K n {w 1 ,A T ). 

For simplicity, we will sometimes refer to the Lanczos vectors and {wj} 1 - =1 

as the V-W sequence, and to the auxiliary vectors and {g,}" = , as the P-Q 

sequence. The four sets of basis vectors are generated using coupled two-term 
recurrences of the form: 

V n = P n U n , AP n = V n+l L n , 

w n = g n r- 1 £f„ r„, a t q u = w n+1 r;* 1 L„r„ 1 

Here, 

P n '~[Pl P2 • • Pn] ^ Qn'=[<h ?2 9n ] . 

*The discussion that follows will not cover all the details and will not justify all the statements 
made. For full details, we refer the reader to [6]. 
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while U n is an upper triangular n 
given by 

'1 «i2 «m 

<•„:= ° 1 : 

• ’• '■ u n- l,n 

„0 ■■■ 0 1 

and r n is the diagonal matrix defined in (2.8). As was shown in [6], the matrices 
L n and U n define a factorization of the block tridiagonal Hessenberg matrix H n 
generated by the three-term look-ahead Lanczos algorithm, 

H n = L n U n . (3.1) 

In addition, it is possible to reduce L n and U n to block bidiagonal matrices, by 
constructing the basis vectors p n and q n so as to be block A-biorthogonal. Here, 
similar to the V-W sequence, the vectors p n and q n are also constructed using 
look-ahead techniques. For example, we again have blocks P^\ 

pii) = [Pm, Pm ,+ 1 ]> 

where p m * is called regular, the other vectors in the block are called inner, and 
the indices rrij satisfy 

1 =: m 1 < m 2 < • • ■ < m k < n < k := k(n). 

The regular vectors p mj satisfy the A-biorthogonality condition 

qjAp mj = 0 for all i < m jt (3.2) 

while the inner vectors satisfy only a relaxed version of this condition. Once again, 
the structure of Q n parallels that of P n , and the A-biorthogonality of the two sets 
of basis vectors can be written as: 

Q T n AP n = E n = diag El% E := (QM) t AP**\ (3.3) 

Before we consider the implementation details, let us briefly discuss an outline 
of the algorithm. At each iteration, the process consists of the following four basic 
steps. 

Algorithm 3.1. (Overview of the coupled algorithm with look-ahead) 

1) Decide whether to construct p n and q n as regular or inner vectors , 

2) Compute p n and q n as either regular or inner vectors . 

3) Decide whether to construct r n+1 and w n+1 as regular or inner vectors . 

4) Compute v n+1 and u> n+1 as either regular or inner vectors . 


atrix and L n is an upper Hessenberg matrix, 


and L n 


hi i 
P2 
0 


12 


*22 

P3 


‘In 


Pn + 1 J 
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Steps 1) and 3) are the basis of the look-ahead strategy, and they each consist 
of three checks. Recall from (2.6) and (2.7) that the Lanczos vectors v n+l and 
^n+i can obtained from the previous Lanczos vectors by a block three-term 
recurrence. Similarly, it is possible to show that the vectors p n and q n also have a 
block three-term recurrence, of the form 

AP n .^P n G n _ x and A T Q n . l -Q n V~‘ l G n . l T n . lt (3.4) 

where 

G„-\ := U n L n -i- (3-5) 


The look-ahead strategy for the two pairs of sequences is then similar, and was 
first proposed in [3]. In Step 1), the algorithm checks whether: 


"WllPnll > 
n(A)\\q n \\ > 


< 7 'm,n(£ ( * ) ) > C P S > 


n — 1 


£ l(tU'n-l ), , in ^\ IIP, II, 

(3.6) 



n-1 _ . . 


£ -■ j(^t„_.) i „_ 1 |n g< ii. 

i=m fc _i 1 

(3.7) 


Here, eps is machine epsilon, and n(^4) is an estimate for the norm of A. The 
vectors p n and q n are built as regular vectors only if all three of the above checks 
are satisfied. Likewise, in Step 3), the algorithm checks whether: 

0-min(^ (,) ) > «P«. 


"(A) > £ |(^„)J, (3-8) 


»u> > £ t Of) 

t=nj_i '• 

and again, the vectors t> n+1 and u> n +i are built as regular vectors only if all three 
of the checks are passed. The motivation for these checks can be found in [3, 6]. 
Here, we will only note that the look-ahead strategy proposed will build regular 
vectors in preference to inner vectors, and thus it will take as few look-ahead steps 
as possible. 

Once the decisions in Steps 1) and 3) are taken, the next vectors p n and q n , 
and t? n+1 and u; n+1 , are built in Steps 2) and 4). For p n and q ny let k* denote the 
number of the row of the first possible nonzero element in the nth column of U n . 
It can be shown that 

= max {j | 1 < j < k and m ; - < max{l, n t — 1}} . (3.10) 

From (3.3), both the regular and the inner vectors have the same coefficients 

Pm*.™*— l.n. given by » 

tW:m*— 1.» = (diag (E«'\ 

.[Q(**> Q(**+»> ... QC'-Vf Av n . 


t\Ve denote x t:J = [z, 


^*+1 x jV • 


Implementation of the coupled QMR algorithm 7 


For the regular vectors, the coefficients f7 mfc;n _i jn are determined by the condi- 
tion (3.2), 

0n»:«-l ,n = {EW)-\QW) T Av n , (3.11) 

while for the inner vectors, the coefficients U mk:n ~ l n are arbitrary: 

tij n =Cm€C, for t = m t ,m i + l,...,n-l. (3.12) 

This completes the computation of the recurrence coefficients for p n and q n] and 
the vectors are then computed from 

n— 1 

P n = v n~ X P«' U *n> 

i=m k + 
n — l 

9n =V>n- X 9i U <n(T„/Ti)- 
i-m k * 

For u n+1 and u; n+1 , let /* denote the number of the row of the first possible 
nonzero element in the nth column of L n . It can be shown that 

t* = max {j | 1 < j < l and rij < m k } . (3.13) 

Again, by (2.3), both the regular and the inner vectors have the same coefficients 
given by 

= (diag(D<'*>, DV-V))- 1 

,[^('*) ... wV- 1 ')} 1 ' Ap n . 

For the regular vectors, the coefficients L nrn n are determined by the condi- 
tion (2.2), 

i„ 1 :n l „=P (0 )" 1 (^ ( ' ) ) T ^Pn, (314) 

while for the inner vectors, the coefficients L nrn n are arbitrary: 

J,„ = rj in € C for i — n (l n, + 1, . . . ,n. (3.15) 

Once the recurrence coefficients for v n+1 and u> n+1 are computed, the vectors are 
constructed by scaling the vectors 

n 

«n+l = ^Pn ~ X v i ( in> 
i=rii* 

n 

U'n + l = A T q n - X W , l in(7n/7i)> 

i—n t * 

to have unit length. 
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4 Implementation details 


We now turn to a detailed description of the implementation of Algorithm 3.1. If 
the coupled two-term Lanczos process is run without any look-ahead steps, it will 
require two inner products at each step in order to compute all the coefficients 
of the recurrence formulas. Hence, the goal for the look-ahead implementation is 
to also require only two inner products per step for all the recurrence coefficients. 
Recall that the look-ahead strategy (3.6) and (3.7) for the P-Q sequence and 
the normalization (2.5) require a total of four norm computations, so that the 
implementation will require two inner products and four norms per iteration. 

To begin, let us introduce the auxiliary matrices 

F n :=WZAP n and F„ := Q T n AV n , 


whose columns are needed in (3.14) and in (3.11). In addition, we will make use 
of the following symmetry relations from [6] : 


D n T n = (A»r„) T , ( 41 ) 

E n T n = (E n T n f, (4.2) 

F„r n = (F„r n ) T . (4.3) 

The nth iteration of the implementation will update the matrices D n _ x , E n _ i, 

L n ~ i, b n _ l5 P n _ j, Qn- 1) V n ' ^fi’ Qn* 

V n+l , and W n+1 , respectively. We first list an outline of the algorithm as we have 
implemented it. 


Algorithm 4.1. (Coupled algorithm with look-ahead) 

0) Choose v x , w x G with ||v 1 ][ = \\wtW = 1, and compute wjv 1 . 
Set k = 1, m k = 1, / = 1, nj = 1. 

For n = 1,2,..., do : 

1) Update D n _ x to D n . 

2) Determine k* from (3.10): 

k* = max [j [ 1 < j < k and m ; < max{l,n, - 1}} . 

3) Compute F n l . n _ l from (4.4) below , using L n ~ l and D n l:n . 

Then compute F 1;n _ l n from (4.3). 

4) Check whether E ^ ts nonsingular : 

innerp = <r min (E {k) ) < eps. 

5) Compute the pari of {/ 1;nn that is determined by biorthogonality : 

u m , m<+1 -i, n = (E {i) )-HQ {i) ) T Av n 

= (F(*))- 1 F mi:m , +1 _ 1 , n) * = **,...,*- 1- 

If innerp, go to 6). Otherwise , set 

u mk , n . = (E^)-\Q W ) T Av n = (E^)- l F mk:n _ lin . 


Implementation of the coupled QMR algorithm 9 


6 ) 


7 ) 

8 ) 


9 ) 

10 ) 

11 ) 


12 ) 

13) 

14) 

15) 

16) 
17) 


Build the part of p n and q n that is common to both regular and 
inner vectors : 

m fc -l 

Pn = V n~ X 
m fc -l 

<ln= W n - X 9« U in(7„/7,)- 
If innerp, go to 11). 

Build G mfc:n - u-i an d check the coefficient G mfc _ i:n - i ( n-i- 
If innerp, go to 11). 

Build p n and q n as regular vectors ; 

n — 1 

Pn Pn ^ y Pilin' 
i-m k 

n — 1 

?n = <ln - X ?. tl .n(7n/7,)- 

«=mk 

Compute Ap n , q%Ap n , ||p n ||, and ||? n ||. 

Build and check the coefficient G mk . n _ i n . // innerp, go to 11). 

Set m k+l — n } k = Jb 4- 1, and 12). 

Choose the inner recurrence coefficients u in , * = m fc , . . n — 1, and 
6u:/d p n and <j n as inner vectors: 

n - 1 

Pn Pn ^ y Pilin’ 

i=m fc 

n- 1 

9n = 9n - X ?. U <n(7n/7<)- 
i-m* 

Compute Ap n , qlAp n , ||p n ||, and ||g n ||. 

7/ llPnll = 0) or Iknll = 0. then stoP- 
Compute A T q n . 

Update E n _ y to E n . 

Determine l* from (3.13): 

t* = max {j | 1 < j < l and n ; < m^} . 

Compute F l:n n from (4.5) below, using E n and U n . 

Check whether D W is nonsingular : 

innerv = **>{*») < eps. 
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18) Compute the part of L 1:nn that is determined by biorthogonality. 


L 




(£>« )~ l {W^) T A Pn 
(D^)~ l F n , ni+1 _ ltn , i = — 1. 


If innerv, go to 19). Otherwise, set 

L n , = (D^)~ l (W^) T Ap n = 

19) Build the part of u n+1 and u> n+1 that is common to both regular and 
inner vectors : 


ni-l 

^n-j-1 = Ap n ~ 'y ^ 
i=nj* 


ni-1 

*4+1 = A T q n - X w i l in('tnhi)- 

i=n |* 

If innerv, go to 24). 

20) Build H ni . n n and check the coefficient H nj _ i:n n . 
If innerv, go to 24). 

21) Build D n+1 and w n+1 as regular vectors : 

n 

4 + 1 — 4+1 — ^ ^ 

i=n, 


*4 + 1 =•», + ! - X] W Jin(~tnhi)' 

i=r»i 

Compute p n+1 = l n+hn = p n+ i||, £„+i = ||*4+ill- 
// ^> n+ i = 0 or 4 +1 = 0, then stop. 

Otherwise, set 7 n+1 = 7 n Pn+i/4+i) and compute u£ +1 i 5„ +1 . 

22) Build and check the coefficient H ni:n n+1 . If innerv, go to 24). 

23) Set n, +1 = n + 1, l = Z+ 1, and go to 25). 

24) Choose the inner recurrence coefficients l in> i = n lt . . . ,n, and 
build u n+1 and io n+1 as inner vectors: 

n 

4 + 1 = 4 + 1 - X V i l in> 

i=n t 


n 

*4 + 1 = *4+1 - X W i l in(y n /7i)- 

i=n, 


Compute p n + l = / n + l n = ||4+lll> 4+1 = ll*4+lll- 
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If Pn+1 = 0 or £ n+1 = 0, then stop. 

Otherwise, set j n+1 = ItnPn+l/Zn+i, and compute u£ +1 v„ +1 . 

25) Set 

«n + l = ®n+l /Pn+V w n +l = ™n+l/Z n +l’ 
w n+l v n+l = ™n+lV n +l/(Pn+lZ n +l)- 

Step 1. The diagonal term wfv n has already been computed directly, at the end 
of the previous step. Next, using 

Fn-i = K-iAP^ = Wf_ 1 V n L n _ l 

= Ai-l^l:n-l,l:n-l + ^i,n-l^l:n-l,n [0 *•’ 0 1], 

the remainder of the last column of D n is computed from D n _ X) F n _ 1} and L n _ 1 . 
The last row of D n is obtained by symmetry, using (4.1). 

Step 7. We build G mk . n _ 1 n _ x > which would be the coefficient of the P( k ) and 
Q( k ) blocks in the three-term recurrences (3.4) for p n and q n . Using (3.5), one has 

n 

Gi t n - 1 = ^ ] ^ij — 1 > * “ 771 k » • • - > n ~~ I ‘ 

J=» 

The coefficient G mh :mk ~ x n _i has already been built as part of Step 9) at the 
previous iteration. We then check (3.6)-(3.7), and set innerp to TRUE if at least 
one of the two checks fails. 

Step 9. We build G mk . n _ l n , which would be the coefficient of the P ^ and Q ^ 
blocks in the three-term recurrences (3.4) for p n+1 and It is straightforward 
to show that 

G mk:n _ 1 , n = (^ ) )- 1 (Q (fc) ) T ^Pn- 

Moreover, we have 

Ql-xAAp n = (. A T Q n _ l ) T A Pn = (Q„r- 1 tf n L„_ 1 r„_ 1 ) T Ap n 
= I‘n-lI’n-lUnI'n 1 QnAp n 
= T n . 1 Ll_ 1 U^T~ 1 [0 ••• 0 q„Ap n ] T 

= 2 T ±l n,n-x[0 ••• 0 qlAp n f. 

In 

We then check a subset of (3.6)— (3.7), and set innerp to TRUE if at least one of 
the two checks fails. 

Step 14. The diagonal term q„Ap n has already been computed directly, as part 
of Step 8) or Step 11). Next, using 

F n = W% AP n = T n UZ r ~ x QlAP n , (4.5) 

the remainder of the last row of E n is computed from 2? n _ 1? ^i ;n ,i:n-i> an( I U n . 
The last column of E n is obtained by symmetry, using (4.2). 
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Step 20. We build H ni:nn) which would be the coefficient of the and W ^ 
blocks in the three-term recurrences (2.6) and (2.7) for v n+l and w n+l . Using (3.1), 
one has 

n 

H in = 'Y J l ij u jn , 7 = 71,, ... ,71. 

J=* 

The coefficient H ni _ x:ni _ l n has already been built as part of Step 22) at the 
previous iteration. We then check (3.8)-(3.9), and set innerv to TRUE if at least 
one of the two checks fails. 

Step 22. We build H n . nn+1 , which would be the coefficient of the and 
W W blocks in the three-term recurrences (2.6) and (2.7) for v n+2 and tr n+2 * It is 
straightforward to show that 

1 = (D (,) rHW m ) T Av n+l . 

Moreover, we have 

W*A ® n+1 = ( A T W n fAv n+1 = (W r n+1 r-*. 1 L B l7 n r il ) T v n+1 
= T n U^LlT~ l +1 Wj +1 v n+1 
= T n U^LlT- l +1 [ 0 ■••0 ] T 

— — ^n+l,n [® 0 ^n+l^n+ll ■ 

7n+l 

We then check a subset of (3.8)— (3.9), and set innerv to TRUE if at least one of 
the two checks fails. 

We remark that the checks in steps 9) and 22) are actually slightly relaxed 
versions of (3.8)-(3.9), and (3.6)— (3.7), respectively, since the indices checked are 
only a subset of the full range appearing in (3.8)— (3.9) and (3.6)-(3.7). We also 
note that the algorithm above requires minimal inputs from the user. Recall that 
eps in steps 4) and 17) is machine epsilon. Furthermore, the estimate n(A) for the 
norm of the^matrix can be updated dynamically, as was done in [3]. 

The coupled Lanczos Algorithm 4.1 requires per iteration the computation of 
two inner products and four vector norms. We conclude this section by noting 
that, in Algorithm 4.1, the choice of the inner recurrence coefficients (3.12) and 
(3.15) is arbitrary. In our implementation of the algorithm, we have used 

when m k < n — 2, 
for i = m ki . . . , n — 3, 

when rif < n — 1, 
for i = . . . , n — 2, 

for the inner vector recurrence coefficients. 


u 


n — l,n 


= l. 


« n -2,n = 1, 
«in = 0, 
*nn = 1 , 
l n - l,n = *> 
hn = 
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5 The coupled two-term QMR algorithm 

We now consider the quasi-minimal residual approach and briefly outline how 
it can be combined with the coupled two-term look-ahead Lanczos algorithm of 
Section 3 to obtain a new implementation of the QMR method. We note that the 
QMR algorithm was proposed in [5] for the case of nonsingular linear systems (1.1). 
It was later shown by Freund and Hochbruck [4] that the algorithm can also be 
applied to singular systems, and that it always generates well-defined iterates. 
However, these iterates converge to a meaningful solution only for the special case 
of consistent systems with coefficient matrices A of index 1. Here, we consider 
the QMR method for the general case of N x N linear systems, with singular or 
nonsingular coefficient matrices. 

The QMR algorithm belongs to the family of Krylov subspace methods. Let 
x 0 G be an initial guess for the solution of (1.1), and r 0 = b — Ax 0 the 
corresponding initial residual, of length p Y = ||r 0 ||. Choosing v x = r 0 /p l as the 
starting right Lanczos vector for Algorithm 4.1, and w x with \\w x \\ = 1 as an 
arbitrary starting left Lanczos vector, one obtains the four basis sets, V n , W n) P n , 
and Q n) of which the ones of interest are V n and P nt related by: 

v n = P n R n and AP„ = V n+1 L n . 

Once the basis vectors are constructed, the nth QMR iterate is selected from the 
shifted Krylov subspace x 0 + K n (r 0 ,A) as 

*n = x 0 + p n y n > (5- 1 ) 

where y n E C" is defined by the quasi-minimal residual condition 

l|/n+l-^nynll= mi "„ ll/n+1 ~ L r>y\l ( 5 2 ) 

y€ C 

This isan(n-f-l)xn least-squares problem, where 

fn+i := ■ [ i o ••• o] T e R n+1 , 

and we have used the normalization (2.5) of the Lanczos vectors; otherwise, the 
least-squares problem above also involves a diagonal scaling matrix. 

Note that, by setting 

= (^n) _1 yn> 

and inserting in (5.2), one obtains the equivalent least-squares problem 

Wfn +1 - L n R n z n \\ = min ||/ n+1 - L n R n z\\, 
z E C 

which is exactly the least-squares problem solved by the QMR algorithm based 
on the three-term Lanczos process. Thus, the QMR iterates (5.1) are, in exact 
arithmetic, identical to the iterates of the original QMR algorithm [5]. However, 
as was shown in [6], in finite precision arithmetic, the coupled QMR algorithm is 
more robust than the three-term recurrence version. 

Like the original QMR algorithm, the new implementation has a number of 
desirable properties. Since they are equivalent, all the theoretical properties of the 
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three-term recurrence QMR algorithm carry over to the coupled version. The new 
method also shows a smooth and nearly monotone convergence curve. At all times 
during the iteration, an upper bound for the QMR residual norm is available at 
no extra cost, and, as the examples will show, this upper bound is a very good 
indicator of the convergence of the method. As in the original version, the QMR 
iterate has a short update formula, involving only one direction vector that can 
be updated with a two-term recurrence. This is slightly cheaper than in the old 
method, where the corresponding search directions for the update of the iterates 
had a three- term recurrence. Finally, the new version seems to be significantly 
more robust than the old version, though no theoretical results are available to 
explain the differences. For a full discussion of the properties of the two QMR 
algorithms, see [5, 6 ]. 


6 N umerical experiments 


In this section, we present a few numerical examples with the new implementation 
of the QMR algorithm. All these examples were run either on a Cray-2 at the 
NASA Ames Research Center or on a Cray Y-MP at AT&T Bell Laboratories, 
with a machine epsilon of about 5.0E— 29. 

In the plots below, we always show two curves, the computed scaled residual 
norm ||r n ||/||r 0 || (solid line) and the residual norm upper bound (dotted line). 
Recall that the upper bound is available at each step at no extra cost. 

Example 6*1. This example is taken from [2]. Here we consider the differential 
equation 

Lu = J on (0,1) x (0,1) x (0,1), (6.1) 


where 


- -l (•-*)■ -IKK HO 

+50(x + y + z)tt- + ( 7 — — ~r — — 25o) u, 

v dx \l + x + j/ + z ) 

with Dirichlet boundary conditions u = 0. The right-hand side / was chosen so 


that 


u = (1 - x)(l - y)(l - z)(l - e *)(l-e y )(l - e z ) 


is the exact solution of (6.1). We discretized (6.1) using centered differences on a 
40 x 40 x 40 grid with mesh size h = 1/41. This leads to a linear system of order 
N = 64000 with 438400 nonzero entries. The starting vector w 1 was a random 
vector, and the initial guess x 0 was zero. The example was run with a right SSOR 
preconditioner [1], with a? = 1.0. The algorithm stagnated at 1.6E— 25 after 119 
steps. For the V-W sequence, it built 4 blocks of size 2, and forced a block closure 
once. For the P-Q sequence, it built 2 blocks of size 2, and forced a block closure 
once. 
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Figure 1: Convergence curves for Example 6.1. 


Example 6.2. This example was provided to us by V. Venkatakrishnan [10], 
from the Numerical Aerodynamic Simulation Group at the NASA Ames Research 
Center. It comes from an unstructured 2-D Euler solver, and it corresponds to the 
system at the beginning of time-stepping. The linear system is of order N = 62424 
with 1717792 nonzero elements. The right-hand side 6 and the starting vector Wj 
were both random vectors, while the initial guess x 0 was zero. Once again, the 
example was run with a right SSOR preconditioner, with w = 1.0. The algorithm 
was stopped once it reached 1.0E— 20, after 107 steps. For the Y-W sequence, it 
built 3 blocks of size 2, and 1 block of size 3. For the P-Q sequence, it built no 
blocks, but forced a block closure once. 

Example 6.3. This example was taken from [8], where McQuain and his collabo- 
rators investigated the applicability of iterative methods to linear systems arising 
in circuit simulation. In particular, they studied the solution of systems involving 
the Jacobians that arise when a homotopy algorithm is applied to the computa- 
tion of the DC operating point of a circuit. While these linear systems seem to 
be rather difficult, they are not intractable. In the example, a Jacobian of order 
N = 1853 with 8994 nonzero elements was considered; it is the first Jacobian from 
the IS7B sequence discussed in Section 5.3 of [8]. The right-hand side b was ob- 
tained by moving the last column of the original rectangular nx(n-j-l) Jacobian 
to the other side. The starting guess x 0 was zero, and the starting vector w x was 
set w l = v x — b. The example was run with the variant described in [5] of Saad’s 
ILUT preconditioner [9], with no additional fill-in allowed and a drop tolerance of 
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Figure 2: Convergence curves for Example 6.2. 


0.001, which generated a matrix L with 4766 nonzero elements, and a matrix U 
with 5034 nonzero elements. The algorithm stagnated at 8.7E—23 after 67 steps. 
It built no look-ahead blocks. 

The examples shown illustrate several points. As already noted, the coupled 
QMR algorithm has a rather smooth and almost monotone convergence behavior. 
It makes available a residual norm upper bound that is a very good indicator of 
the convergence of the method. Finally, while it is possible to construct examples 
with arbitrary look-ahead structure, numerical experience seems to indicate that, 
on the average, the look-ahead strategy does not build many look-ahead blocks. 
Indeed, the vast majority of the steps taken by the coupled two-term look-ahead 
Lanczos process are regular steps; furthermore, from the look-ahead steps of size 
greater than 1 taken, the majority are blocks of size 2. 


7 Concluding remarks 

We have presented details of an implementation of a new look-ahead algorithm 
for constructing Lanczos vectors based on coupled two-term recurrences instead 
of the usual three-term recurrences. We then discussed a new implementation of 
the quasi-minimal residual algorithm, using the coupled process to build the basis 
for the Krylov subspace. While the theoretical results derived for the original 
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Iteration Dumber 


Figure 3: Convergence curves for Example 6.3. 


algorithm carry over to the new one, the latter was shown in examples to have 
better numerical properties. 

FORTRAN 77 codes for the coupled-two term look-ahead procedure and the 
resulting new implementation of the QMR algorithm can be obtained electroni- 
cally from the authors (freund@research.att.com or na.nachtigal@na-net.ornl.gov). 
We note that FORTRAN 77 codes for the original implementation of QMR and 
the underlying look-ahead Lanczos algorithm are available from netlib by send- 
ing an email message consisting of the single line “send lalqmr from misc” to 
netlib@ornl.gov or netlib@research.att.com. 
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